Non-destructive monitoring of forming quality of self-piercing riveting via a lightweight deep learning

Self-piercing riveting (SPR) has been widely used in automobile body jointing. However, the riveting process is prone to various forming quality failures, such as empty riveting, repeated riveting, substrate cracking, and other riveting defects. This paper combines deep learning algorithms to achieve non-contact monitoring of SPR forming quality. And a lightweight convolutional neural network with higher accuracy and less computational effort is designed. The ablation and comparative experiments results show that the lightweight convolutional neural network proposed in this paper achieves improved accuracy and reduced computational complexity. Compared with the original algorithm, the algorithm’s accuracy in this paper is increased by 4.5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}%, and the recall is increased by 1.4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}%. In addition, the amount of redundant parameters is reduced by 86.5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}%, and the amount of computation is reduced by 47.33\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}%. This method can effectively overcome the limitations of low efficiency, high work intensity, and easy leakage of manual visual inspection methods and provide a more efficient solution for monitoring the quality of SPR forming quality.

www.nature.com/scientificreports/ Currently, the conventional destructive section method, curve tolerance zone method 11 , and human visual appearance quality inspection are mainly used to check the quality of SPR. Except for the tolerance zone method , which can detect the molding process of all joints, the other methods described above are not suitable for large-scale rapid inspection, often for sampling inspection. At present, it is widely acknowledged that the more dependable quality monitoring approach is still the tolerance zone method, that is, the tolerance band of the qualified joint is generated by the force-displacement curve and offset of the formed qualified joint. The forming joint beyond the tolerance band is considered unqualified, and the tolerance band method is not only insensitive to lateral displacement but also unable to monitor defects such as rivet offset, sheet cracking, button cracking, button drop, etc., and there is a large detection blind spot. These defects are then often a key factor affecting the reliability of the joint. Therefore, the monitoring of the forming quality of SPR molding is a difficult problem, and how to build a comprehensive and efficient SPR quality inspection system to improve the reliability and practicability of the SPR process needs further research.
In recent years, with the rapid development of artificial intelligence algorithms, and the accumulation of raw data in various disciplines is increasing. Deep learning algorithms are now widely employed in mechanical 12 , biomedical 13,14 , civil engineering 15,16 , materials 17 , and other sectors 18 , allowing for more efficient problem-solving in a variety of fields. In the field of mechanical smart manufacturing, digital twins are combined with reinforcement learning to extend to more complex manufacturing systems, using deep neural networks to solve the challenges posed by large state and action spaces 19 . In the field of biomedicine, the pathological diagnosis and analysis of medical images can be effectively realized through deep learning algorithms 20 ; In the monitoring of structure health, combined with the full convolutional network algorithm to build a crack detection system for steel structures, long-term and effective monitoring of cracks in steel structures can be effectively realized 21 . The effective prediction of material structure design and performance can be realized by deep learning 22,23 . And  www.nature.com/scientificreports/ the SPR process combined with artificial neural network (ANN) can quickly achieve parameter optimization. Through SPR technology combined with back propagation neural network, the interlock and residual bottom thickness etc. are optimized through ANN 24 . The SPR molding process can easily lead to various forming defects due to differences in rivet quality, sheet quality differences and equipment stability. Due to the relatively small defects, especially the cracks in the buttons, it is difficult to find by artificial vision, and these defective joints often become the starting point of failure. This is one of the reasons why this paper proposes a new quality monitoring method to solve this limitation. The appearance quality of the joint is mainly detected by means of spot checks, relying on the judgment of engineers. The formed joints with defects are often only a minority, and the sampling method can easily lead to the omission of abnormal joints, leaving a safety hazard, and if the defect is comprehensively checked, it is extremely labor-intensive. In addition, there are no reports in the literature on how to monitor for SPR external defects. In order to solve the safety mentioned above hazards of SPR defects and further improve the SPR forming quality monitoring system, we propose using visual recognition technology to monitor the external defects of SPR effectively. This paper proposes a non-contact and long-distance non-destructive detection method for SPR defects based on computer vision. Aiming at the actual engineering application scenarios of SPR and the problems of large numbers of deep learning parameters, large amounts of computation, and obstruction of engineering applications. A convolutional neural network algorithm with greater accuracy and less computation is proposed, which accomplishes the optimization of the algorithm's light weight and accuracy when combined with inverted residual with linear bottleneck, depthwise separable convolution, and adaptive weighting approach.
Appearance quality monitoring methods SPR forming defects and causes of formation. This paper summarizes the common types of defects in the self-pierce riveting forming process from a large number of formed joints, as shown in Fig. 3. The empty riveting was mainly due to the stuttering of the conveying rivet mechanism, and the rivet cannot be got out normally. As a result, no rivets appeared during the riveting process, and the punch was squeezed directly against the upper sheet to form a rivet-free joint. Repeated riveting is often due to a failure of the rivet feeding mechanism to send out two rivets at the same time, resulting in two rivets stacked on the upper sheet. Sheet cracking is due to individual sheet quality problems or some difference in the local mechanical properties of the sheet caused by cracking in the forming process. Molding failures may be due to fluctuations in the performance of the SPR machine, or failure of the riveting limit that causes the punch to form differently from the preset, thereby affecting the joint forming. Rivet flip due to the reverse of rivet when it is loaded into the feeding mechanism.
According to the quality evaluation standards of SPR, the appearance of SPR molding joints is required to have no cracks, full buttons, and the rivet head height and sheet are flush.
Deep learning algorithms. Traditional image processing algorithms rely on target color and texture features to obtain image feature information, and face a series of challenges such as complex background, defect diversity, brightness, contrast, blur, and resolution variation. Deep learning is not limited to pixel values, and achieves 'understanding' of image information in a training manner. Deep learning algorithms are suitable for more complex scenarios, and have stronger generalization capabilities in the face of the changes of background, brightness, target scale, etc.
Deep learning algorithms are mainly for defect processing, mainly object detection, image segmentation, image classification algorithms, etc. The processing process of the image segmentation algorithm needs to traverse www.nature.com/scientificreports/ all the pixels in the image, and the calculation cost of the algorithm is large and is slow, and it is not suitable for the application of fast riveting scenes. For the image classification algorithm, if two riveting defects occur in the input image at the same time, it is prone to incorrect classification, the field of view is limited, and the defect cannot be located. Compared with the above two class of algorithms, the object detection algorithm is more lightweight and flexible, and can realize the positioning, classification, and counting of the target at the same time, and can realize the detection of multiple targets at the same time. Due to the complexity of the SPR scene, the joint has a high degree of similarity with the background, the riveting range is large and the field of view is large, and the metal surface is susceptible to the influence of light, which brings certain noise to the image. Therefore, the object detection algorithm is more suitable and efficient in the SPR scenario.
Overview of appearance inspection method. In the process of SPR, a variety of appearance quality failures are prone to occur. For this reason, this paper proposes a new non-contact long-distance vision inspection method for the SPR appearance forming quality problem, and the overall implementation process is shown in Fig. 4. First, the image of the molded joint is obtained through a vision sensor. Acquire defective connector images at different backgrounds, resolutions, brightness, and viewing angles. Types of defects include sheet cracking, button cracking, button shedding, repeated riveting, empty riveting, etc. The data enhancement of the acquired images can effectively expand the amount of data while improving the robustness and generalization ability of the algorithm, and ensuring the reliability of the algorithm in different scenarios. The dataset is then randomly divided into training sets, validation sets, and testing sets, and corresponding annotation files are made. Finally, the algorithm is trained, verified and tested, and the performance of the algorithm is measured by evaluation indicators.

Methodology
In this paper, we combine deep learning algorithms to monitor the quality of SPR, and optimize the feature extraction network and feature fusion network based on the Yolov5 25 algorithm for SPR application scenarios to ensure high detection accuracy and low computational overhead. The framework of the proposed algorithm consists of three structures: feature extraction network (Backbone), feature fusion network (Neck) and object detection (Head). Among them, the Backbone is composed of the inverted residual with linear bottleneck (IRBottleneck), depthwise separable convolution (DWCBL), convolutional block attention mechanism (CBAM) and spatial pyramid pooling (SPP), while the Neck is composed of traditional convolutional blocks, feature multiplexing modules (C3) and CBAM, and object detection follows the mesh positioning method of Yolov5 algorithm. The overall architecture of the algorithm in this paper is shown in Fig. 5.
Depthwise separable convolution. In order to efficiently extract the underlying feature information and reduce the number of redundant parameters and computation, Backbone is combined with depthwise separable convolution. Depthwise separable convolution can effectively achieve on convolutional layer acceleration, Xception 26 , MobileNetV1 27 and other algorithms have achieved algorithm lightweighting by combining depthwise separable convolution. Using this new convolution operation, replacing the traditional convolutional struc- www.nature.com/scientificreports/ ture can effectively reduce the amount of computation and parameter. The underlying features are extracted primarily by the Backbone. The GhostNet experimental findings reveal that the shallow feature layers have relatively comparable characteristics, and that the shallow features have a linear relationship 28 . Combined with depthwise separable convolution can simultaneously achieve the extraction of underlying features and the depth of the extended network, and at the same time, can ensure a certain linear relationship between shallow features. The difference between depthwise separable convolution and conventional convolution is shown in Fig. 6. When the  www.nature.com/scientificreports/ dimensions of the input and output are the same, the depthwise separable convolution is equivalent to splitting the conventional convolution into two steps. First by depthwise convolution and then by pointwise convolution to adjust the number of channels. Taking Fig. 6 as an example, the parameters of conventional convolution are 4 × 3 × 3 × 3 = 108 , while the total parameter amount of depthwise separable convolution using depth is 39, of which the parameter amount of depthwise convolution is 3 × 3 × 3 , and the parameter of pointwise convolution The parameter amount of depthwise separable convolution is 1/3 of that of conventional convolution. The reduction of the amount of parameters can effectively reduce the display pressure, calculation pressure and communication pressure of ordinary equipment.
A lightweight strategy for IRBottleneck. AlexNet 29 has aroused a wave of application of convolutional neural networks in various fields due to its amazing performance on the ImageNet dataset. Various algorithms with higher accuracy have appeared one after another, such as Single shot multibox detector(SSD) 30 , CenterNet 31 , etc. However, the performance improvement obtained by building more complex, deeper, and wider convolutional layers requires higher computing resources. These algorithms often exceed the computing power of most mobile and embedded devices, and devices with high computing power become the threshold for algorithm application.
In order to achieve the balance between algorithm precision and computational consumption, a new mobile framework in the field of computational vision, namely IRBottleneck, is used to construct Backbone, which can be used on mobile devices and various tasks 32 . The IRBottleneck structure is shown in Fig. 7. First, the input low-dimensional features are expanded to high-dimensional space through 1 × 1 convolution, and deepening the structure of the intermediate layer can minimize the loss of information during the transfer process. The higher the output dimension of the transfer layer, the less information is lost. The features are further extracted using depthwise convolutional filters, and finally the features are mapped back to the low-dimensional space again through 1 × 1 convolution and linear activation function. The information flow of the region of interest (ROI) is stored in some channels in the layer structure. By reducing the dimension of the operation space, the information of these ROI can be effectively captured and utilized. And a linear activation function is used in the final output of the module, which effectively avoids the collapse of feature information and leads to information loss. This structure is suitable for mobile devices, reduces the calculation amount of the algorithm inference process, and significantly reduces the memory occupation of the device.
Adaptive weighting strategy. Attention is a simple and efficient feature-adaptive reinforcement mechanism, and the attention mechanism is widely used in natural language processing 33 , speech recognition 34 , object detection and classification 35 , etc. To understand attention from a more essential perspective, the attention mechanism is to assign higher weights to more important target information, and to assign less weight to unimportant information, so as to allocate computing resources reasonably. It can be seen from the experimental results of many algorithms on public dataset 36,37 that combining the attention mechanism can better handle complex image information and improve the algorithm's understanding of the ROI.
The lightweight process of the algorithm often leads to a decrease in performance. At the same time, with the deepening of the convolution layer and the use of a large number of downsampling, the target information will be lost to a certain extent, especially for small targets or targets similar to the background impact is more serious. The algorithm in this paper utilizes the CBAM in the Backbone and Neck to enhance the feature information from the channel and space as a whole. Using CBAM can effectively strengthen target information, weaken unnecessary background information, and effectively improve the representation of ROI. It can effectively improve the performance of the algorithm with very little parameter amount and computational overhead. The overall architecture of CBAM is shown in Fig. 8.
CBAM mainly contains two structures, namely channel attention and spatial attention. Each channel of channel attention is equivalent to a filter, which can strengthen the high-level semantic information in the channel, and the main role of spatial attention is to effectively enhance the contour representation of the ROI. The overall implementation mechanism of CBAM is: feature mapping for an input F ∈ R C×H×W . The one-dimensional channel attention map M c ∈ R C×1×1 is obtained through the channel attention module, and the F ′ is obtained by multiplying with the input feature, and the two-dimensional spatial attention map ( M s ∈ R 1×H×W ) is obtained www.nature.com/scientificreports/ by using the spatial attention module, and finally F ′ is multiplied with Ms to obtain F ′′ , the above calculation formula is as follows: Eqs. (1) and (2).
The way to make a channel attention map using input features is shown in Fig. 9. First, average pooling and max pooling are performed along the spatial direction to aggregate the channel information of each feature. Then add them through the shared multi-layer perceptron(MLP) network respectively, and finally obtain the channel attention map through the sigmoid function. The acquisition principle of the channel attention map is as shown in formula (3).
Spatial information mainly includes target position and contour information, combined with the spatial attention map, it can effectively strengthen the expression of the area of interest. The acquisition method of the two-dimensional spatial attention map is shown in Fig. 10. The maximum pooling and average pooling are performed along the channel direction respectively, and the results of the two pooling are spliced and compressed and fused by 7 × 7 convolution, and finally converted into weight values by the sigmoid function.

Experimental verification
Dataset and experimental environment. Obtaining a large number of defective riveted joints directly on the automobile body will consume a lot of manpower and material resources, and is too expensive, which is impractical in terms of time and cost. For this reason, standard riveted joints are made by referring to the sampling method of GB 2649-89 welded joint mechanical performance test, and the riveting situation with a large interval and a large range is simulated. At the same time, select a part of the sheet for riveting in a denser way to simulate the scene of dense riveting. The SPR experimental platform and visualization part of the dataset are shown in Fig. 11. While increasing the amount of data, it also tests the actual performance of the algorithm.
(1)   www.nature.com/scientificreports/ Due to the characteristics of the SPR forming process, the differences in the shape of different joints after SPR forming are relatively small. Similarly, the feature differences of the same type of defects are relatively small. Therefore, in the process of constructing the dataset, under the premise of having an adequate amount of data, it is necessary to ensure the completeness of the defect types and the diversity of the data to fit the network model well. Therefore, images of all possible defect types that may appear in the SPR process were collected under different scenes, including backgrounds, brightness, resolution, and stacking of riveting parts. The defect types include cracks in thin plates, button cracks, forming failures, rivet flipping, empty rivets, double rivets, and high rivet heads. A total of 242 original images were collected. Then, data augmentation methods such as rotation, brightness variation, and color transformation were used to expand the data and increase the diversity of the dataset, resulting in 434 images in total. Data augmentation, as a cheap way to expand data, can effectively avoid overfitting, reduce sensitivity to data, and improve the robustness and generalization ability of the algorithm 38 .
The 434 images obtained were used to label the data labeling software Labelimg to make training set, validation set, and testing set, with 70% , 13% , and 17% respectively. Assign more and harder to detect data to the testing set than to the validation set. To ensure fair comparisons between algorithms, all algorithmic environments were configured on a desktop (Intel(R) Core (TM) i7-10700k CPU @3.80GHz x16 and GeForce RTX 2080 SUPER GPU) with a training epoch of 500, a batch-size of 6, and the rest of the parameters were kept default.
Ablation studies overview. This paper will systematically explore the influence of IRBottleneck, depthwise separable convolution and attention mechanism on the performance of the algorithm through ablation experiments, so as to design a more efficient algorithm. Ablation experiments will be conducted from the following perspectives, respectively, by validating MoblileNetV2 as the Backbone of the Yolov5 algorithm (+MoblieNet); IRBottleneck combined with conventional convolution as the Backbone (+IRB+C); IRBottleneck combined with depthwise separable convolution as the Backbone (+IRB+D); the IRBottleneck combined with depthwise separable convolution is used to construct the Backbone, and then the attention mechanism is introduced on the Backbone and Neck as proposed in this paper (+IRB+D+A(Our)). The specific ablation experiments are shown in Table 1. At the same time, it is compared with algorithms such as SSD, CenterNet, and Yolov5.
All algorithms are experimented on the same dataset, and the loss function allows to analyze the error between the predicted and ground truth of the algorithm training process. The classification loss function for training uses the Binary Cross Entropy loss function, which is used to measure the accuracy of sample classification, and its expression is shown in Eq. (4), and the bounding box regression uses the L CIoU function 39 , whose expressions are shown in Eqs. (5)- (7).
y represents the probability that the algorithm predicts that the i-th sample is a certain class, y (i) represents the label, and N represents the total number of samples. Figure 11. Experimental equipment, specimen types, and dataset construction. www.nature.com/scientificreports/ ρ denotes the distance between the two points of the solution, b and b gt denote the centroids of the prediction bounding box and the ground truth box, respectively. c denotes the minimum diagonal length of the two bounding boxes, α is the equilibrium parameter. α is 0 when intersection over union (IoU) is less than 0.5, and α is V /(1 − IoU) + V when IoU is higher than 0.5. The similarity of the aspect ratio is measured by V, and the width-height information is considered in the regression process of the bounding boxes. w gt , h gt and w, h denote the width and height of the ground truth box and the prediction bounding box, respectively. The training results are shown in Fig. 12. In addition, the impact of the optimization module on the classification loss and bounding box loss will be analyzed through the loss value. Fig. 12a is the object classification loss, Fig. 12b is the bounding box regression loss, and Fig. 12c is the total loss. After the network training, it can be seen from the analysis of the classification loss, bounding box loss and total loss of training that the classification loss value of all algorithms has reached a small value, and it can be seen from Fig. 12a that a IRBottleneck is added to the algorithm, which can make the algorithm have better convergence and quickly reach a smaller loss value. After algorithm training, all algorithms are tested.
Algorithm performance evaluation metrics. This paper will evaluate the algorithm performance, space complexity, and time complexity through multiple dimensions such as precision, recall, mean average precision (mAP), Giga floating point operations per second (GFLOPs), and the number of model parameters, where precision is used to measure the detection accuracy of the algorithm. The algorithm's capacity to locate all positive classifications is measured by recall, which is used to assess the algorithm's degree of missed detection. The greater the recall, the less likely it is to be missed. GFLOPs is a general indicator used to measure model complexity, and it is also commonly used to measure the speed of neural network models. The larger the GFLOPs, the more complex the model, the more computing resources the model occupies, and the higher the device performance requirements. The formula for precision and recall are shown in Eqs. (8) and (9). The formula for mAP is shown in Eqs. (10) and (11). Because there is only one category in this paper, average precision (AP) is equal to mAP. The mAP is different under different IoU thresholds. The IoU is calculated as shown in Fig. 13.  Quantitative analysis of algorithm performance. The algorithms mentioned above and comparative algorithms for ablation experiments are tested to analyze the performance of different algorithms comprehensively. Table 2 shows the quantitative comparison results of different algorithms. Precision, recall, mAP 0.5 , parameters, and GFLOPs are selected to analyze the algorithm's overall performance. The verification performance reflects the learning ability of the algorithm to the data to a certain extent, and the test performance shows the application effect of the algorithm, reflecting the strength of the general ability of the algorithm.
It can be seen from Table 2 that the network constructed by integrating the MoblieNet architecture (Experiment 2) can effectively reduce the number of parameters, which is 87% lower than the original Yolov5 algorithm, and the computational cost is reduced by 63.6% . Moreover, the precision has been greatly improved. However, the overall performance of +MobileNet has dropped significantly. Its mAP 0.5 dropped by 4 % , respectively, and the recall dropped by 7.7% . The direct combination of the MoblieNet achieves a significant reduction in the number of parameters and computation, thus achieving the maximum weight reduction of the algorithm. However, the significant reduction in the number of parameters of the feature extraction network and the too small channel of the feature extraction network make the memory capacity of the feature extraction network drop rapidly, which may easily lead to a significant drop in the overall performance.
The analysis results of experiments 3 and 4 show that using the linear inverted residual module and standard convolution or depthwise separable convolution to build a new backbone can effectively reduce the number of algorithm parameters and computational overhead at the same time, and the performance of the algorithm in verification both are very close to the Yolov5 algorithm. This strategy can effectively improve the recall and mAP 0.5 , and the performance improvement is more apparent when combined with traditional convolution, but traditional convolution also introduces more parameters and calculations. Experiment 5 is combined with an adaptive weighting strategy based on experiment 4. From the results of experiment 5, it can be seen that combining CBAM can effectively improve the recall and precision of the test. At the same time, the number of parameters introduced and the amount of computation can be negligible. Fusion CBAM can achieve the simultaneous improvement of improving each performance index while ensuring the lightweight of the algorithm. The analysis of the experimental results of SSD and CenterNet (Experiments 6 and 7) shows that the precision, recall, and mAP 0.5 of both are relatively low. SSD and CenterNet perform poorly in scenarios where the target is occluded, the background is highly similar to the target itself, and the target scale is variable. In addition, compared with the algorithm in this paper, the number of parameters and computation of both are significant.
The algorithm in this paper achieves both accuracy improvement and algorithm lightweight. Compared with Yolov5, the precision is increased by 4.5% , the recall rate is increased by 1.4% , and mAP 0.5 is increased by 0.4% . In addition, the redundant parameters are reduced by 86.5% , and the calculation amount is reduced by 47.33% . Compared with experiments 3-4, the increased parameters and computational overhead are negligible. Table 2 systematically analyzes the performance of all algorithms, that is, the comprehensive performance of the algorithm. This chapter will further analyze the actual detection effect of the algorithm through a qualitative comparison algorithm. The detection effect of the algorithm related to the abla- www.nature.com/scientificreports/ tion experiment is shown in Fig. 14. Each row represents the detection results of each algorithm, with missed detections and false detections marked with yellow and red rectangles, respectively. The overall analysis shows that although all the comparison algorithms can detect riveting defects, they have a certain degree of missed or false detection. Because the target resolution is large or the target and the background are similar, +MobileNet is prone to missed detection. +IRB+C and +IRB+D have more false detections and a small number of missed detections, to a certain extent, because of the reduction in the number of model parameters. From the detection results of +IRB+D+A(Our), it can be found that adaptive weighting can reduce false detection and missed detection, which is consistent with the quantitative analysis results. A qualitative comparison of the different algorithms is shown in Fig. 15. CenterNet has the most missed detections and errors among all the comparison algorithms, and there is a problem with multiple detection boxes for one target. SSD is easy to miss detection of occluded targets, and when the shape of the target is similar to the background, SSD is prone to misidentification. When the target is blurry or highly similar to the background, the Yolov5 algorithm will cause misrecognition. The method in this paper can effectively reduce the number of model parameters and calculations, maximize the detection accuracy, and reduce the missed detection rate and false detection. From the qualitative comparison results, it can be concluded that the method in this paper not only has high detection precision, but also can achieve better robust detection in scenes such as occlusion, scale change, blur, and brightness change.

Conclusion
In this paper, an end-to-end deep learning algorithm is proposed to supervise the appearance quality of SPR, and the Backbone and Neck are optimized for the application scenarios of SPR technology. By combining the IRBottleneck and the depthwise separable convolution, the algorithm is lightweight, and the completeness of the target information in the feature extraction process is guaranteed while reducing the amount of parameters and computation. The adaptive weighting strategy is used to realize the adaptive weighting of the key feature information in the information flow, which effectively strengthens the characterization of the ROI. Ablation www.nature.com/scientificreports/ experiments verify the accuracy and rationality of the proposed algorithm. The experimental results show that the algorithm's detection precision, recall , and mAP 0.5 are improved by 4.5% , 1.4% , and 0.4% , respectively, and the calculation amount is reduced by 47.33% . The algorithm in this paper improves the detection accuracy while effectively reducing the missed detection rate, false detection, memory usage, and computational complexity.
The main contributions of this paper can be summarized as follows: (1) Change the traditional manual visual method, and propose a method based on computer vision to realize intelligent monitoring of external defects of SPR, which can effectively monitor SPR defects such as double riveting, sheet cracking, empty riveting, and button micro-cracks. (2) The inverted residual with linear bottleneck is proposed to realize the optimization of the algorithm, which significantly reduces the amount of calculation and parameters, making the algorithm more conducive to deployment and application in engineering. (3) An adaptive weighted optimization method is proposed, which can effectively improve the precision and recall of the algorithm under the condition of negligible parameters and computational overhead.
SPR involves numerous process parameters, including pre-tightening force, puncture force, holding force, stroke, and die. In most cases, there are multiple sets of optional parameters available for connecting thin sheet materials, but only one set of parameters will produce the least amount of defects. Algorithms can be used to detect and tally the types and quantities of defects under different process parameters, thereby achieving optimization of the process parameters. Furthermore, our approach can be used in conjunction with the curve tolerance band method to construct a more efficient SPR defect detection system, providing a new method for improving the SPR supervision system.

Data availibility
The datasets generated during and/or analysed during the current study are not publicly available due to ongoing development of the algorithms but are available from the corresponding author on reasonable request.